Efficiently Computing Arbitrarily-Sized Robinson-Foulds Distance Matrices

نویسندگان

  • Seung-Jin Sul
  • Grant R. Brammer
  • Tiffani L. Williams
چکیده

In this paper, we introduce the HashRF(p, q) algorithm for computing RF matrices of large binary, evolutionary tree collections. The novelty of our algorithm is that it can be used to compute arbitrarilysized (p × q) RF matrices without running into physical memory limitations. In this paper, we explore the performance of our HashRF(p, q) approach on 20,000 and 33,306 biological trees of 150 taxa and 567 taxa trees, respectively, collected from a Bayesian analysis. When computing the all-to-all RF matrix, HashRF(p, q) is up to 200 times faster than PAUP* and around 40% faster than HashRF, one of the fastest all-to-all RF algorithms. We show an application of our approach by clustering large RF matrices to improve the resolution rate of consensus trees, a popular approach used by biologists to summarize the results of their phylogenetic analysis. Thus, our HashRF(p, q) algorithm provides scientists with a fast and efficient alternative for understanding the evolutionary relationships among a set of trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal algorithms for computing the Robinson and Foulds topologic distance between two trees and the strict consensus trees of k trees given their distance matrices

It has been postulated that existing species have been linked in the past in a way that can be described using an additive tree structure. Any such tree structure reflecting species relationships is associated with a matrix of distances between the species considered and called a distance matrix or a tree metric matrix. A circular order of elements of X corresponds to a circular (clockwise) sca...

متن کامل

Algorithms for Computing Cluster Dissimilarity between Rooted Phyloge- netic Trees

Phylogenetic trees represent the historical evolutionary relationships between different species or organisms. Creating and maintaining a repository of phylogenetic trees is one of the major objectives of molecular evolution studies. One way of mining phylogenetic information databases would be to compare the trees by using a tree comparison measure. Presented here are a new dissimilarity measu...

متن کامل

Comparison of Additive Trees Using Circular Orders

It has been postulated that existing species have been linked in the past in a way that can be described using an additive tree structure. Any such tree structure reflecting species relationships is associated with a matrix of distances between the species considered which is called a distance matrix or a tree metric matrix. A circular order of elements of X corresponds to a circular (clockwise...

متن کامل

Faster Computation of the Robinson-Foulds Distance between Phylogenetic Networks

The Robinson–Foulds distance, a widely used metric for comparing phylogenetic trees, has recently been generalized to phylogenetic networks. Given two phylogenetic networks N1, N2 with n leaf labels and at most m nodes and e edges each, the Robinson–Foulds distance measures the number of clusters of descendant leaves not shared by N1 and N2. The fastest known algorithm for computing the Robinso...

متن کامل

The Generalized Robinson-Foulds Metric

The Robinson-Foulds (RF) metric is arguably the most widely used measure of phylogenetic tree similarity, despite its well-known shortcomings: For example, moving a single taxon in a tree can result in a tree that has maximum distance to the original one; but the two trees are identical if we remove the single taxon. To this end, we propose a natural extension of the RF metric that does not sim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008